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Abstract 



We find the asymptotic number of connected graphs with k vertices and k — 1 + 1 edges when k, I 
approach infinity, reproving a result of Bender, Canfield and McKay. We use the probabilistic method, 
analyzing breadth-first search on the random graph G(k,p) for an appropriate edge probability p. 
Central is analysis of a random walk with fixed beginning and end which is tilted to the left. 
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1 The Main Results 



^ ■ In this paper, we investigate the number of graphs with a given complexity. Here, the complexity of a 
0"S . graph is its number of edges minus its number of vertices plus one. For k, I > 0, we write C(k, I) for the 
^ j number of labeled connected graphs with k vertices and complexity I. 

The study of C(k,l) has a long history. Cayley's Theorem gives the exact formula for the number 
\ of trees, C(k,0) = k k ~ 2 . The asymptotic formula for the number of unicyclic graphs, C(k, 1), has been 
given by Renyi \2\ and others. Wright 11 gave the asymptotics of C(k,l) for I arbitrary but fixed and 
k — > oo, and also studied the asymptotic behavior of C(k, I) when I = o(k 1 ^ 3 ) in |12j . 

The asymptotics of C(k,l) for all k,l —* co were found by Bender, Canfield, and McKay [B]. The 
« ' proof in [3] is based on an explicit recursive formula for C(k,l). In this paper, we give an alternate, and 
^ ■ substantially different, derivation of the Bender, Canfield, McKay results. Our argument is Erdos Magic, 
^ , using the study of the random graph G(k,p) to find the asymptotics of the strictly enumerative C(k, I). 
^ ] The critical idea, given in Theorem II .11 below, involves an analysis of a breadth first search algorithm on 
G(k,p). Similar methods, with somewhat weaker results in our cases, were employed recently by Coja- 
Oghlan, Moore and Sanwalani 4 . We can also use the results and methodology in this paper to find 
local statistics on the joint distribution of the size and complexity of the dominant component of G(n,p) 
in the supercritical regime, which we defer to a future publication jSJ. Further, while computational 
issues are not addressed in our current work, these methods may be used to efficiently generate a random 
connected graph of given size and complexity [Q. The idea of using the random graph to study C(k,l) 
has appeared previously. In [Hj, a reformulation in terms of random graphs was used to prove upper and 
lower bounds on C(k,l), extending the upper bound in 0. The idea in |S] is that the expected value 
of the number of connected components with k vertices and complexity I can be explicitly written in 
terms of C(k,l). Bounds on the random number of such components then imply bounds on C(k,l). In 
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|lUj . a more sophisticated analysis was performed, where the connected component of a given node in the 
random graph was explored using breadth-first search. This analysis allows to rewrite the asymptotics 
of C(k,l) for I fixed in terms of Brownian excursions. Interestingly, this identifies the Wright constants 
for the asymptotics C(k,l) ~ cik k ~ 2 k 31 / 2 in terms of moments of the mean distance from the origin of 
a Brownian excursion. These moments were also investigated in [J], but the connection to the Wright 
constants had not been made before. 

In this case, we will use the breadth-first search representation of connected components in G(k,p) for 
C(k, I), choose p appropriately, and analyse the resulting problem using probabilistic means. The critical 
identity is Theorem 11.11 which rewrites the C(k,l) in terms of a /c-step conditioned random walk with 
steps that are Poisson random variables minus one, and where the parameter of the Poisson steps varies 
with time. The main work in this paper then lies in the study of this random walk. 

We note that when I > kink (and even somewhat less) the asymptotics of C(k,l) are trivial as 
asymptotically almost all graphs on k vertices and k — 1 + I edges will be connected. Thus, while our 
methods extend further, we shall restrict ourselves to finding the asymptotics of C(k,l) with I < kink 
and I —* oo. It will be convenient to subdivide the possible I into three regimes: 

1. Very Large: / ^> k and I < kink; 

2. Large: I = G(k); 

3. Small: / <C k and / — > oo. 

The main ideas of this paper are given in Section ^ where we state the main results Theorems 11.21 
and 11.51 The consequences of our main results for C(k,l) are formulated in Section [2j The proofs of 
Theorems 11.21 and 11.51 are given in Section 13 



1.1 Tilted Balls into Bins 

Let k > 2 be an integer. In this section, we define a process of placing k — 1 balls into k bins with a tilted 
distribution, which makes it more likely that balls are placed in the left bins. In Section ll.2l below, we 
will find an identity between this bin process and C(k, I). 

Let p G (0, 1]. We have k — 1 balls 1 < j < k — 1 and k bins 1 < i < k. We place the j th ball into bin 
Tj, where Tj is a random variable with distribution given by 

Pr r T . _ i] - Pil-PY' 1 (1 n 

This is a truncated geometric distribution. Note that the larger p is the more the Tj are tilted to the left. 
We shall call p the tilt of the distribution. The Tj are independent and identically distributed. Let Zj, 
1 < i < k, denote the number of balls in the i th bin. Set Yq = 1 and Yi = Y^i + Z,- L — 1 for 1 < i < k. 
Note that Y/~ = as there are precisely k — 1 balls. Let TREE be the event that 

Y t > for 1 < t < k - 1, (1.2) 

or, alternatively, 

Zi + ... + Z t >t for 1 < * < Jfe — 1. (1.3) 

We note that we use the term TREE because there is a natural bijection between placements of k — 1 balls 
into k bins satisfying TREE and trees on k vertices. Alternatively, one may consider Yq, Yi, . . . , with 
Yq = 1, Yk = 0, as a walk with fixed endpoints, or a bridge. The condition TREE can then be interpreted 
as saying that the bridge is an excursion. In certain limiting situations the bridge approaches a biased 
Brownian bridge, where the bias depends on the parameter p. 
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Definition 1. Set 




in the probability space in which the Tj are independent with distribution given by U.lp . Set M* equal 
the same random variable but in the above probability space conditioned on the event TREE. 

We can give an alternative definition of M as follows: 

CtA fc-l fc-l 

aj-E^-EC^-i). (1.5) 

which can be seen by noting that both sides of (jl.5j) increase by one when one ball is moved one position 
to the left and decrease by one when one ball is moved one position to the right. Since one can get from 
any placement to any other placement via a series of these moves, the two sides of (|1.5j) must differ by a 
constant. However, when Tj = j for 1 < j < k — 1, we have Y{ = \ for 1 < i < k — 1 and = and so 
the sides are equal for this placement of balls. 

1.2 The Critical Identity 

The main idea of our approach is given in Theorem 11.11 below. Note that this result is exact, there are 
no asymptotics. 

Theorem 1.1. For all k,l £ N, p £ (0, 1], 

AxAiA-s = C{k, /y^-^l - pjGH*^- 1 ), (1.6) 

where 

Al = (i_(i_p)*)*-i, (1.7) 

A 2 = Pr[TREE], (1.8) 

A 3 = Pr[BIN[M*,p] = I}. (1.9) 

Proof. The right hand side of (|1.6j) is the probability that G(k,p) is connected and has complexity I. We 
show that the left hand side of ()1.6|) also gives this probability. Designate a root vertex v and label the 
other vertices 1 < j < k — 1. We analyze breadth-first search on G, starting with root v. (More precisely, 
the queue is initially {v}. In Stage 1 we pop v off the queue and add to the queue the neighbors of v. 
Each successive stage we pop a vertex off the queue and add to the queue its neighbors that haven't 
already been in the queue. The process stops when the queue is empty.) 

Each non-root j flips a coin k times, heads with probability p. The i th flip being heads means in the 
breadth-first stage that if the i th stage is reached and j has not yet entered the queue, then j is adjacent 
to the "popped" vertex. To get all vertices it is necessary that each j has at least one head. This happens 
with probability A±. Conditioning on that, we let Tj be that first i when j had a head. So Tj has the 
truncated geometric distribution of (|1.1|) . While the process continues Yt is the size of the queue. The 
condition that the process doesn't terminate before stage k is precisely that no Yt = for 1 < t < k — 1 
which is TREE, so this gives Ai- Now the only {wi,W2} whose adjacency has not been determined are 
those for which (letting w\ be the first one popped) W2 was in the queue when w\ was popped. There are 
precisely Ylt=o0^t — 1) of such pairs, i.e., we add the size of the queue minus the popped vertex over each 
stage, except for the last stage. Since we are conditioning on TREE, the random variable J2t=o(Xt — 1) 
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has distribution M*. We now look at those pairs, each is adjacent with independent probability p and 
to have complexity I, we need to have exactly I such pairs adjacent, so that the probability of this event 
equals A3. □ 



Our approach to finding the asymptotics of C(k,l) will be to find the asymptotics of A%, A3. This we 
shall be able to do when, critically, p has the appropriate value. We will let p depend on I and k, and the 
choice of p is described in more detail in Section fl .31 Looking ahead, we shall assume 

, 1/9 In k , 

AT 3/2 «j) < 10— . (1.10) 
k 

It will be convenient to subdivide the possible p into three regimes: 



1. Very Large: j)> k 1 and p < 10Mp 

2. Large: p = % for some c > 0. 

3. Small: £r 3 / 2 < p « Ar 1 . 



In each of these cases, we will write p = %, where c — * when p is small, and c —* 00 when p is very 
large. The remainder of this section is organized as follows. In Section 11.31 we define how to choose 
p appropriately, and we show that the above three regimes for p correspond to the three regimes of I 
given earlier. In Section 11.41 we investigate two walk problems, and relate the probability of TREE to 
the probability that these two walks do not revisit their starting point 0. In Section 11.51 we show that 
both M, and, more importantly, M* obey a central limit theorem. Finally, in Section El we state the 
consequences of our results concerning Pr[TREE] and the asymptotic normality of M* for C(k,l). 



1.3 The Choice of Tilt 

Let (J,, a 2 denote the mean and variance of M. Both of these have closed forms as a function of p. We 
have the exact calculation 



li=(k-l)[±-±-E[T 1 ]] = (k-l) 



k 1 - (k + l)p(l - p) k - (1 - p) k+1 
2 



p(l_(l_p)*) 



We choose p to satisfy the equation 



PH = I. 



(1.11) 



(1.12) 



We can show from Calculus that fx = fj,(p) is an increasing function of p and so (|1.12|) will have a unique 
solution. The asymptotics depends on the regime. If p = § , then 



pfi ~ fi(c)k and a 2 ~ /2(c) A; 3 



with 



1 l-(c+l)e" 

2 c(l - e- c ) 



/i(c) = c 
and, setting k = c(l — e" c ) _1 , 

/ 2 (c) = K \e- c [-c- 1 - 2c~ 2 - 2c~ 3 ] + 2cT 3 



K\e- c (-c- 1 - c~ 2 ) + c~ 2 



(1.13) 
(1.14) 

(1.15) 
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In particular, for p small, and using that /i(c) ~ and /2(c) ~ ^ when c { 0, 



PM=^f + 0(*V) and a 2 -^-, (1.16) 



12 y 1 ' 12 
while for p very large, and using that /i(c) ~ | and /2(c) ~ c~ 2 when c | 00 



pk ty k , , 



We see that the three regimes of p do indeed correspond to the three regimes of I, as we show now. Indeed, 
for I small, we have that pjjL ~ = I is equivalent to 

p ~ k- 3/2 Vui. (1.18) 

On the other hand, for I large, if I ~ k(3, then 

P~| with /i(c)=/3, (1.19) 

while for 2 very large, 

p~2Z£T 2 . (1.20) 

The asymptotics in ()1.13j) with p ~ f can be found by approximating k^Tj by the continuous truncated 
exponential distribution over [0, 1], which has density ce~ cx /(l — e~ c ). 



1.4 Two Walks 

We define two basic walks. In application the Zj, Zf below will be random variables of various sorts and 
so ESC, ESC L , ESC R , ESCf become events. 

Definition 2. Let Z\,Zi,--- be nonnegative integers. The leftwalk is defined by the initial condition 
Yq = 1 and the recursion Yi = Y^i + Z{ — 1 for i > 1. The escape event, denoted ESC, is that Yi > for 
all i > 1. T/ie event ESC i; or escape until time L, is that Y{ > /or 1 < i < L. 

It shall often be convenient to count the bins "from the right." Let Zf, 1 < i < k, denote the number 
of balls in the k - i + 1-st bin. Set Y R = and Yf = Y£_ x + 1 - Zf so that Y* = Y h _ i+X . Then TREE 
becomes 

Y t R > for 1 < t < k- 1, (1.21) 

or, alternatively, 

Zf + . . . + Zf < t - 1 for 1 < t < k - 1. (1.22) 

We shall generally use the superscript R when examining bins from the right. In particular, we set 
i R = k — i + 1 so that bin i R is the i th bin from the right. 

Definition 3. Let Zf , Zf, . . . be nonnegative integers. The rightwalk is defined by the initial condition 
Yff = and the recursion Y R = Y i R 1 + 1 — Zf for i > 1. The escape event, denoted ESC fl , is that Y R > 
for all i > 1. The event ESCf , or escape until time L, is that Y R > for 1 < i < L. 

We allow Zj, Zf to be defined only for 1 < i < L in which case 1^, YJ fl are defined for < i < L and 
ESC i; ESCf are well defined. Indeed, our main results will be for these walks of length L, the infinite 
walks shall be a convenient auxiliary tool. 

When k — 1 balls are placed into k bins with tilt p and Zj is the number of balls in bin i the event 
ESC L is that Yt > for 1 < t < L. Letting Zf be the number of balls in bin i R the event ESCf is that 
Y R > for 1 < t < L. 
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Definition 4. Given L < \k, we call bins with 1 < i < L the left side; bins with 1 < i R < L the right 
side; and all other bins the middle. 

Now consider the tilted balls into bins formulation of Section fl.ll Set 

A = (*- 1 ) i-(f- p) * and XR = ^- K% v -~ P r (L23) 

so that A, X R are the expected number of balls in the leftmost and rightmost bin respectively. When 
p = r , the asymptotics of A and X R are given by 



. c r> ce~ 
A ~ and A ~ 



(1.24) 



In particular, for p very large, A — > oo and X R ~ 0, while for p small, A = 1 + ^(1 + o(l)) and 
A « = 1-^(1 + o(l)). 

Theorem 1.2. Zei ESC 6e given by Definition^ with all Z^ ~ Po(A). Let ESC fl be given by Definition 
with all Zf ~ Po(A fl ). Let p be in the range given by \1.1U\) . Then 

PrfTREE] ~ Pr[ESC] Pr[ESC fl ]. (1.25) 

We may naturally interpret Theorem 11.21 as saying that the event TREE is asymptotically equal to 
the probability that the left and right sides satisfy the conditions imposed by TREE. For i small, Yi 
behaves like a leftwalk with Zi being roughly Po(A) and Y R behaves like a rightwalk with Zf being 
roughly Po(A fl ). The proof of Theorem ll.2l is deferred to Sectional 

The left and right walks with Z{, Zf independent Poisson have been well studied. Let Z ~ Po(l + e). 
Let Zi ~ Z all i, independent. Consider the leftwalk as given by Definition [2j 

Theorem 1.3. Pr[ESC] = y where y is the unique real number in (0, 1) such that e~^ 1+£ ^ y = 1 — y. 
Further, if y = y(e), then y < 2e for all positive e and y ~ 2e as e — > + . 

Proof. We use that there is a bijection between random walks with i.i.d. steps with distribution Po(A) — 1 
and Galton- Watson trees with offspring distribution Po(A). This bijection is such that random walks 
that never return to the origin are mapped to branching process configurations where the tree is infinite. 
For the latter, we have that the probability is the survival probability of the branching process. The 
extinction probability x satisfies 

e -Kx-i) =x (L26) 
Therefore, for the survival probability y = 1 — x, we obtain 

e -(l+e)y = 1 _ y . (1.27) 

The inequality y(e) < 2e and the asymptotics y{e) ~ 2e are elementary calculus exercises. □ 



Next let Z R ~ Po(l — e). Let Zf ~ Z R all i, independent. Consider the rightwalk as given by Definition 
131 Then we can identify the probability of ESC fl exactly as follows: 

Theorem 1.4. Pr[ESC fl ] = e. 
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Proof. Consider an infinite walk starting at zero with step size 1 — P where P is Poisson with mean 1 — e. 
Here, e 6 (0, 1) but we do not assume e — > 0. We claim Pr[ESC fl ] = e precisely. Take an infinite random 
walk, = Yq, Yi,Y%, . . . and let W n be the number of i, where < i < n, for which St = Yi +t — Y^ for 
t > never returns to zero, i.e., the number of i, < i < n for which Yj > Yi for all j > i. 

For each i this has probability a of occurring so that by linearity of expectation i£[W n ] = not. Let 
V n be the minimum of Yj for j > n. Then, by definition Wn = max[Vn,, 0]. Indeed, for each < j < V n , 
let i = be the maximal i, < i < n for which = j. These are precisely the i for which the walk 
beginning at time i has the desired property. Thus not = E\m&x.\V n , 0]]. So far everything is exact and 
now it follows from the fact that the random walk has positive drift that 

Em S=M= E . (1.28) 

n— >-oo 77, 

□ 

Suppose p = £ with c> fixed. Then Pr[ESC] = y where e~ Xy = 1 - y by (fT2T|) and A is given by (|3.37f) 
so that Pr[ESC] ~ 1 - e~ c . Theorem Q1 gives Pr[ESC fl ] = 1 - X R where X R is given by (f3~3T|) so that 

Pr[ESC fl ] ~ 1 ~{ c _ + e - ) c" C ■ The asymptotics of Theorem O are then that for p = %, 



Pr[TREE] ~ 1 - (c + l)e" c . (1.29) 



In particular, for p small 

Pr[TREE] ~ (1.30) 

while for p very large 

PrfTREE] ~ 1. (1.31) 

1.5 The Limiting Gaussian 

In this section, we give an asymptotic normal law for M* and the consequent asymptotics of A$. For 

M, by the fact that the Tj are independent, Esseen's Inequality gives that M is asymptotically Gaussian 
with mean fi given in (|1.11|) and variance a 2 . Therefore, for any fixed real u 

Pr[M < n + ua] ~ f -^e~ t2/2 dt. (1.32) 



Theorem 1.5. Let M* be given by Definition^ Then for any fixed real u 



Pr[M* <h + wt]~ f U -^e~ t2/2 dt. (1.33) 

J -co v27T 



Here, importantly, fi is given by (jl.llj) . the expectation of the unconditioned M. Theorem 11.51 then 
has the natural interpretation that conditioning on TREE does not change the asymptotic distribution 
of M. The proof of Theorem 11.51 is deferred to Section |21 

We next use Theorem EH to determine the asymptotics of A3. For this, we define a Y by 

al=pn+p 2 o- 2 . (1.34) 
Proposition 1.6. With p given by ll.lty and a Y given by \l.3J$ , whenever p 2 a 2 = 0(pp), 

Pr[BIN[M*,^] = I] ~ ay 1 {2ir)- 1/2 . (1.35) 
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Proof. We require only the asymptotic Gaussian distribution 1)1.32)1 . Using infinitesimals, the probability 
that a^ 1 (M* — fi) G [u,u + du] is {2n)~ l / 2 e~ u l 2 du. Given that M* is in this range and using that / = pfi, 
we have 

Pr[BIN[M*,p] = I] ~ Pr[BIN[// + ua,p] = p/j]. (1.36) 
Note that the mean of BIN[^ + uo~,p] is equal to p(fi + ua) and its variance is 

p(l - p)(fx + Ma) ~ p([i + ucr) ~ pfi = I, (1-37) 



where in the latter equality, we use that pa = 0{y/pfl) = 0(Vl) = o(l) = o(pn). Therefore, p\i is 
u{pa){pij)^ 1 / 2 standard deviations off the meanp//+pw and so this probability is ~ (2ttI)~ 1 / 2 exp[— ^^S- 



Since p 2 a 2 = 0(pfi), the values as u — > ±oo are negligible and 

/ + OO 2 CT^ 

e'^ — du, (1.38) 

which gives 1)1.35)1 . □ 

Again we can look at the asymptotics (we won't need finer expressions) in the different regimes. 

1. When I is small, then p[i = l ~ p 2 a 2 and 

a 2 Y ~ 21. (1.39) 

2. When I is large, say I ~ k(3, then a 2 p 2 ~ c 2 f 2 {c)k with c satisfying /i(c) = (3 as in 1)1.19)1 so that 

4 -/(l + c 2 ^)/?" 1 )- (1.40) 

3. When I is very large, then a 2 p 2 = o(pn) and 

4 ~ I. (1.41) 

2 Asymptotics for C(k,l) 

We now use the results in the previous section, in particular Theorems 11.1) 11.21 and 11.5) to derive the 
asymptotics for C(k,l). Indeed, for p given by (|1.12|) . the asymptotics of all terms in ()1.6j) are known 
except C(k,l). Hence we can solve for the asymptotics of C(k,l). While Theorem 11.11 and the auxiliary 
results allow us to find the asymptotics of C(k, I) in theory, some of the technical work can be challenging. 
Here we indicate some of the major cases. 

It shall be helpful not to use the precise p given by 1)1.12)1 . Recall that Theorem 11.11 holds for any 
value of p. For the moment let us write (i = fi(p), a = a(p), A 2 = A 2 (p) an d A3 = A^(p) to emphasize 
this dependence. 

Lemma 2.1. Let po be the value of p satisfying ll.lty) . i.e., PofJ-(po) = I- Let p be such that p ~ po an d 
pp(p) = I + o(/ 1 / 2 ). Then A 2 (p) ~ A 2 (p ), er(p) ~ cr(p ) and A 3 (p) ~ A 3 (p ). 

Proof. The asymptotics of A 2 (p) = Pr[TREE] are given by (|1.29|) - (|1.31|) and clearly have this property. 
Similarly the formulae 1)1. 13)1 . ()1. 16)1 - ()1.17)l show that a(p) ~ cr(po)- An examination of the proof of 
Proposition O gives that the asymptotics ()T33|) of Pr[BIN[M*,p ] = I] apply to Pr[BIN[M*,p] = /] as 
long as / — pn(p) = o(/ 1 / 2 ) as the integral 1)1.38)1 remains the same. □ 
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2.1 I Small 

Theorem 2.2. When I = o(k 1 / 2 ), 

C(k, I) ~ fc fc - 2 A; 3/ / 2 (e/12/) i / 2 (37r- 1 / 2 )/ 1 / 2 . (2.1) 

Proo/. Setting p = k'^y/W, we have from (fTTTT^) that pfi = I + 0(fe 4 p 3 ) and 0(&V) = 0(A; _1/2 / 3/2 ) = 
o(2 1 / 2 ) as Z = o(/c 1//2 ). Lemma 12, II then allows us to use this p with A 2 ,A 3 given by the p of (|1.12|) . We 
start with the exact formula 

C(k, I) = A 1 A 2 A 3 p-( k+l - 1 \l - P )-(2)+( k+l - 1 \ (2.2) 

By (fOHjl . we have 

A 2 = Pr[TREE] ~ -{kpf = 61k' 1 . (2.3) 

We further have p 2 a 2 +p[i ~ 2Z ~ |A 2 so that Proposition 11.61 gives 

A 3 = Pr[BIN[M*,p] = Z] ~ (2tt) - 1 / 2 (2Z) - 1 / 2 . (2.4) 

We have to be quite careful with the asymptotics of the exact formula A\ = [1 — (1 — p) k ] k ~ l of Theorem 
rm We have 

1 - (1 - p) fc = pfc(l - f + E|- - *± + 0(A 4 )), (2.5) 

and 

pk p 2 k 2 p 3 k 3 4 4 pk p 2 k 2 p 3 k 3 p 2 k 2 p 3 k 3 P 3 k 3 4 4 p/c Z x 

(2.6) 

as long as A = o(A; 1 / 4 ) which occurs if Z = o(/c 1 /2). This 

gives 

Ai-p^ 1 ^-^-^ 2 / 2 ^ 2 . (2.7) 

We further have the asymptotics 



f k+i-i „, „fc 

and 



p ... „„ p . l(i2Z)'/2 A .-3i/2 ; (2.8) 



(l-p)^/2^ e -W (29) 

while 

(1 - p)^-'- 1 ~ 1. (2.10) 

Then Theorem II . II puts everything together and yields ()2.1j) . □ 



2.2 Z Large 

In this section, we take Z such that f3 = l/k is uniformly bounded and uniformly positive, and investigate 
the scaling of C{k,l) in this range. We state the result uniformly in f3 G [eA;,e _1 /v], since we cannot fix (3 
due to the fact that Z = /3/c need to be an integer. The main result is as follows: 

Theorem 2.3. Let e > be arbitrary and fixed. Then as k — > oo and I = l(k) G [e&, 

C(A;, jfe/3) ~ A ■ £ fe • £; (1+/3)fc • AT 3/2 , (2.11) 
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where f3 = l/k, c is the solution to 

e" c - 2(/3 + 1) " C (2 12) 

6 2(/3 + l) + c' [2A2) 

= c(c-2/?) -CQ3/2+1) f2 13) 

A/?(l + ^/2(c)/|3) ' 1 ' j 

and 

2 

^ = cV^ + l) 2 ^?' (2 ' 14) 
Proof. Let / satisfy that / = Z(fc) 6 [efc,e _1 A;]. Then the p of (|1.12|) satisfies 

p=^ + 0(k- 2 ), (2.15) 

where c is the solution to ()2.12j) . Changing p by an additive 0(k~ 2 ) term changes pp(p) by O(l). Lemma 
12.11 allows us to set p = | with A2,A% the same as for that p given by (|1.12j) . We get C(k,f3k) from the 
equation 

A 1 A 2 A 3 = C(k, pk)ifi+ pk -\l - p)^- k -? k+1 (2.16) 

Here, taking care to note that the asymptotics (1 — p) k ~ e~ c are not sufficiently precise to give the 
asymptotics of A\, we find 

A l ~ (1 - e~ c (l - —)) k - 1 ~ ( — )^-i e ((/3+D/ C +V2) C 2 /2 ( 2 . 17 ) 

v v 2/c v 2(/3 + l)+c v 7 



while 



Further, 



while 



A 2 ~l-(c+l)e- c , and A 3 = _==__—-. (2.18) 

^27^/3(1 + c 2 / 2 (c)//3) 

p*(i+/3)-i „ (2.19) 



(1 _p)(s)- fc (l+0)+l ^ e -fcc/2- Cl /2-c 2 /4+ C (/3+1.5) ; ( 2 _ 20 ) 

Solving and employing uniformity of convergence we obtain Theorem 12.31 □ 
2.3 I Very Large 

As a third example suppose I = \ck\nk\. We prove the following result: 
Theorem 2.4. For I = \ck\nk\ with c > \, 

/ fc(fc-i) \ 

CftO-^I.J- (2-21) 

This has the interpretation that the proportion of graphs with vertices and k + l — 1 edges which are 
connected is asymptotically one, or that the probability that a random graph with k vertices and k + l — 1 
edges is connected is asymptotically one. As such, this is immediate from a classic results of Erdos and 
Renyi 0. 
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Proof. We again start from (|2.2|) . Then (|1.2U|) gives that p ~ 2lk~ 2 , which implies j4i ~ A2 ~ 1. (Note 
that A\ ~ 1 fails for c < ^.) Further, Proposition II .61 with the asymptotics in Q1.41JI gives A% ~ (27r/) -1 / 2 . 
It shall be convenient to rewrite this as A3 ~ (27r(Ac + I — 1))~ 1//2 . We conclude that 

C(k, I) ~ (2vr(A; + Z - l))" 1 / 2 p- ( * +,-1) (l " ^T^H^- 1 ) = (2ttB)-^ V s (l " (2-22) 

where we abbreviate A = ( 2 ), B = k + Z — 1. However, this is not a sufficiently precise approximation of 
p to give the asymptotics of C(k,l). Rather, in the region p ~ 2lk~ 2 , the exact expression (jl.llj) can be 
rewritten as follows: 

Lemma 2.5. 

fk\ „ . ! fc(/v- l)(l-p) fe 

»=(.,]-(* -Dp-- W-ri' • < 2 - 23 > 

Proof. This is a simple calculation. □ 
By Lemma 12.51 and using that 1 - (1 -p) k ~ 1, we obtain that 

(k-^p- 1 +0{k 2 (l-p) k ). (2.24) 
As we have required from ()1.12j) that p[i = I, we have 

k + l-l + 0(pk 2 (l-p) k ). (2.25) 



We note that 



A: 
2, 



A' 
B, 



p B (l - p) A ~ B = Pr[BIN[A,p] = B] ~ (2^B)- 1 / 2 , (2.26) 



where the latter equality holds by the local central limit theorem for the binomial distribution whenever 
B -pA = o(s/p(l -p)A). Note that by (l2~25l) . with A = (*) and 5 = fe + Z - 1, 



B-jA = (HI-1)-d( 2 = 0(^(1 - p) fc ) = (^p(l-pL4) = o(fc^), (2.27) 
precisely when 



(l-pf^jpk 2 = o(l). (2.28) 
Since p ~ 2lk~ 2 , this holds precisely when 

V7e~^=o(l), (2.29) 
which is true whenever Z = c^§^ with c > 4. Therefore, in this case, Theorem 11.11 gives 

C(k, l)p B {\ - p) A ' B ~ (27T5)- 1 / 2 , (2.30) 

and thus we deduce 



fc(fc-i) 



□ 

We note that in principle it is possible to extend the above asymptotics to other I for which i — > 00, 
using Lemma 12. 51 and more precise local central limit theorems for Pr[BINL4,p] = B\. 



11 



3 The Technical Theorems 



In this section, we prove Theorems 11.21 and 11.51 The values X,X R , the expected number of balls in the 
first and last bins respectively, are given by (|1.23j) . We start in Section l3~Tl with the easy case where p is 
large and very large. The remaining Sections l3.2H3.6l are devoted to the hard case where p S> k~^l 2 . 

3.1 The Easy Case: p Very Large and p Large 

We note that the arguments for the "hard case" apply to the cases where p is large and very large as 
well. However, many of the subtleties of the hard case can be avoided when p = ^(k^ 1 ). Here we give, 
without full details, a simpler argument that works in these important cases. 

First suppose p ^> k . Let FAILt be the event Yt < 0. For example, FAILi is the event Z\ = 
which has probability e - ^ 1 "^ 1 ^ which approaches zero. The event FAIL^, is the event Zf > so 
Pr[FAILfc] < E[Zf] = X R — > 0. In general, as each ball is dropped independently Z\ + . . . + Z t has 
distribution BIN[/c — l,a] where a = Pr[Tj < i] as given by the distribution (|l,lj) . (Near the right 
side it is easier to work with Pr[Y^ fl < 0].) Chernoff bounds give that J2t=i Pr[FAIL^] — > and so 
Pr[TREE] — > 1, giving Theorem 11.21 Conditioning on an event that holds with probability 1 — o(l) 
cannot effect an asymptotic Gaussian distribution and so Theorem 11.51 follows immediately for the very 
large case. 

We next proceed with the case where p is large. Set p = % . Note A, \ R are given by ljl.23j) . We split 
bins into left, right and middle by Definition Q] We set L = [In 2 k\ for definiteness, though a fairly wide 
range of L would do. With FAIL t as above, Chernoff bounds give J2h<t<k-L Pr[FAIL t ] = o(l). With 
probability 1 — o(l), no bin on the left nor right side has more than In 2 k balls so that the total number 
of balls on the left and right side is less than In 4 k. Thus, Pr[TREE] is within o(l) of the probability that 
both sides have less than In 4 k balls and that the leftwalk satisfies ESC L and that the rightwalk satisfies 
ESC*. 

Let Z* ~ Po(A), 1 < i < L, be independent. Let Zf* ~ Po(A i? ), 1 < i < L, be independent. Placing 
balls into the left and right sides with these distributions with probability 1 — o(l) both left and right sides 
get less than In 4 k balls. Both Pr[ESC £ ],Pr[ESC K ] are within o(l) of Pr[ESC], Pr[ESC B ] for the infinite 
walks and, as they are now independent, Pr[ESC L AESCf] would be within o(l) of Pr[ESC] Pr[ESC fl ]. For 
any fixed nonnegative integers x\, . . . , x L ; x R , . . . , x R the probability that Zi = Xi, 1 < i < L and Zf = xf, 
1 < i < L approaches the same probability with the Zi, Zf replaced by the independent Poissons Z* , Zf* . 
Hence, PrfTREE] is within o(l) of Pr[ESC] Pr[ESC B ], giving Theorem IL"2l 

We next proceed with the central limit theorem Theorem EH The proof of this result is more subtle, 
and we need to show that both mean and variance are not affected by the conditioning. Consider any 
fixed nonnegative integers x\, . . . , x L ;x R , . . . , x R so that, with Xi balls in bin i and x R balls in bin i R the 
events ESC^ and ESC fl both hold. Set m L = x\ + . . . + x L , m R = x R + . . . + x R and further assume 
m L < In 4 k and m R < In 4 k. Let M** be the distribution of (2) — ^2T X where we assume that all remaining 
balls are placed in the middle bins with the truncated geometric distribution. Thus, the law of M** is the 
law of M* conditioned on m L < In 4 k and m R < In 4 k. Let fj,** = E[M**]. Then, the following proposition 
shows that the conditioning does not affect the mean too much: 

Proposition 3.1. fi** — fi = 0(kln 4 k). 

Proof. Lets call these distributions fixededge and unrestricted respectively. There are two differences 
between these distributions. First, the m L + m R balls are explicitly placed in the fixededge distribution. 
The difference in expectation for any particular ball can be at most k so the total difference for these less 
than In 4 k balls is less than fcln 4 k. For the other balls the distinction is between the truncated geometric 
and the unrestricted distribution. Let Y T ,Y V be the placement of a single ball in these two distributions. 
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Consider the experiment of selecting Y T from the unrestricted distribution and then reassigning it with 
the truncated geometric if it did not land in a middle bin. With this linkage we have Y T ^ Y v only when 
the reassignment is made which occurs with probability 0(k^ 1 In 2 k). When it does occur the values are, 
as always, within k. Hence the difference in the expectations is 0(ln 2 k). The total difference for all (at 
most k) of these balls is then 0(k In 2 k). Thus fx** — /i = 0(k In 4 k) + 0(k In 2 k) giving Proposition l3.ll □ 

Now we claim that M** satisfies the asymptotic Gaussian (|1.3'2j) . We may write M** = a — YjT** 
where a is a constant which depends on the fixed placement, the sum ranges over those j for which ball 
j goes into the middle, and TJ* has the distribution of T given by conditioned on it being in the 
middle. We claim M** has variance ~ a 2 with a 2 given by (|1.13|) . For M, M** the variance comes from 
the independent Tj,T** respectively. There are k — 1 and ~ k terms respectively. The variance of each 
Tj and each TJ* is ~ f2(c)k 2 . An easy way to see this is that k~ l T^* has the asymptotic continuous 
distribution on [0,1] with density e~ cx /(l — e~ c ), which is the asymptotic law of k~ 1 Tj when k — » oo. 
From Esseen's Inequality, M** is asymptotically Gaussian with mean fj,** and variance ~ a 2 . Since 
\x — n** = (3(fcln 4 k) = o(k 3 / 2 ), M** is asymptotically Gaussian with the original /j,, a 2 . 

Finally, we consider M* . In the unconditioned placement of balls the probability that either m L > In k 
or m R > In 4 k was o(l). We are now conditioning on TREE but we have already shown that, in this regime, 
PrfTREE] is bounded away from zero. Hence, in the conditioned placement of balls the probability that 
either m L > In 4 k or m R > In 4 k is still o(l). Therefore, excluding o(l) probability, M* is a combination of 
of distributions M** , each of which is asymptotically Gaussian with the same mean and variance. Hence, 
M* is as well. This completes the proof of Theorem 11.51 in the case when p is large. □ 



3.2 The Hard Case 

In Section r3.2H3.61 we study the general case where pk 3 ^ 2 — * oo. Our arguments can be made considerably 
simpler when p is not too close to the lower bound k~ 3 ^ 2 . When we present the general results, we will 
indicate the simplification when p = k~ 1A . These simplifications actually work down to /c~ 3 / 2 times a 
polylog factor. 

We split the k bins into left, middle and right sides as given by Definitional We carefully choose L 
so that 

(kp)- 2 « L « fc-V2 p -i (3.1) 

For example, when p = k~ 1A , we set L = A; ' 85 , far away from both bounds of (|3,1|) . 

Note that the lower bound of (|1.1U|) on p allows us to do this. Also note that k~^l 2 p~ x <C k so that 

L < k. (3.2) 

We set 

£ = Y = o(1) - (3 - 3) 

since p is small. A careful analysis of (|1.1|) gives that 

Pr[T i =«] = -r(l+e + o(e)) for 1 < i < L, (3.4) 

and 

Pr[rj = i R ] = - (1 - e + o(e)) for 1 < i < L. (3.5) 

Roughly speaking, each bin on the left side will get Po(l + e) balls, while the bins on the right side will get 
Po(l — e) balls. It shall turn out that the event TREE is dominated by the events of (jl.3|) for 1 < t < L 
and the events of (fTT2"2"|) for 1 < t < L. 



13 



3.3 Scaling for Small Bias Walks 



Mathematical physicists well understand that walks with a bias e = o(l) are naturally scaled by time 
e~ 2 . Up to time 0{e~ 2 ) the walk behaves as if it had zero drift and afterwards the drift takes over. 
Propositions I3.2H3.3I below investigate the probability of never returning to the starting point, and are 
quite natural. We write Pr* for the law where each bin 1,2,... receives a Poisson(l + e) number of balls. 

Proposition 3.2. If e — ► 0+ and L — >■ oo is such that L > e~ 2 , then Pr*[ESC L ] ~ 2e. 

Proof. As Pr*[ESC] ~ 2e it suffices to show Pr*[^ESC A ESCJ = o(e). 

In the simpler case when p = k~ 1A so e = k~ 0A /2 and L = A; 0,85 , we can bound Pr*[ESC L ] by the 
sum over t > L of Pr*[Zi + . . . + Zt < t]. Here Z\ + . . . + Zt ~ Po(i(l + e)). Basic Chernoff bounds show 
that this probability is so low and drops so fast that summed over all t > L it is o(e). Indeed, it is of 
the form exp [-k c+ °W] for some positive constant c. Now we extend the proof to the small p's for which 
pk 3 / 2 —> oo. 

Consider the infinite walk and let W be the number of t > L such that Y% < Parametrize t = Lx. 
Then 

Pr*[Yt < ^] < Pr:[Po(Lx(l + £ )) < Lx + (3.6) 

Basic Chernoff bounds give that this is at most exp[— (Lxe) 2 /8(Lx(l + e))] < exp[— Le 2 x/16}. Since 
Le 2 S> 1, this probability is o(l) for every a; G (0, 1) fixed. Therefore, by linearity of expectation, 
E[W] = o(L). Let B be the event that Y t = for some i > L. Then we claim E[W\B) > 0.98L. 

Indeed, consider the first such t > L with = 0. Conditionally on the history up to time t, we 
have Pr*[y t+S < Le/2] > 0.99 for all < s < L(0.99). As > Pr*[S] we deduce that 

Pr*[B] = o(l). Now ESC L is an increasing event and B is a decreasing event, so that by the FKG 
inequality 

Pr * [ESC L A B] < Pr*[ESC L ]Pr*[5], (3.7) 
so that Pr*[ESC L A ^ESC] = Pr*[ESC L A B] = o(e). We conclude that 

Pr*[ESC L ] = Pr*[ESC] + Pr*[ESC L A B] = Pr *[ESC] + o(Pr *[ESC L ]), (3.8) 

so that Pr*[ESC L ] ~ Pr*[ESC]. Since, by Theorem O we have Pr*[ESC] ~ 2e, Proposition EH 
follows. □ 



The next proposition gives a similar result for ESCf . In its statement, we let Pr^ £ denote the probability 
law where each bin 1, 2, . . . , oo receives a Poisson(l — e) number of balls. 

Proposition 3.3. If e — > + and L —> oo is such that L 3> e~ 2 , then Pr* e [ESCf] ~ e. 

Proof. Similar to the proof of Proposition 13.21 □ 

We further require two small extensions: 

Corollary 3.4. Let e — > + and L S> e~ 2 . Let Ai, . . . , X L be such that all Aj = 1 + e + o(e). Let 
Zi ~ Po(Aj) be independent and consider the leftwalk defined by Definition^ Then Pr[ESC L ] ~ 2e. 

Proof. For any fixed 5 > we can sandwich this model between one in which all Aj = 1 + s(l — S) and 
one in which all Aj = 1 — e(l — 5). From Proposition 13.21 we bound Pr[ESC L ] between ~ 2e(l — 5) and 
~ 2e(l + 5). As 5 can be arbitrarily small this gives Corollary 13.41 □ 

Corollary 3.5. Let e — > + and L 3> e~ 2 . Let Af,...,Af be such that all Xf = 1 — e + o(e). Let 
Zf ~ Po(Af) be independent and consider the rightwalk defined by Definition^ Then Pr[ESCf] ~ e. 
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Proof. Similar as the proof of Corollary 13.41 now using Proposition 13 . 31 instead of Proposition 13.21 □ 



3.4 Poisson versus Fixed 

It will often be convenient to start with a Poisson number of balls with parameter k, rather than with 
precisely k balls. Indeed, in the Poisson case, the number of balls per bin are independent Poisson random 
variables, which is often quite convenient in the analysis. Therefore, sometimes we wish to compare the 
probability that an event holds when we use a Poisson number of balls to the probability that the event 
holds when we use a fixed number of balls. In this section, we prove a result that allows us to compare 
these probabilities, and which will in particular allow us to convert a probability for the Poisson law into 
a statement for the probability of the event for a fixed number of balls. 

We first introduce some notation that allows us to make this comparison. Consider an event A that 
depends on a nonnegative integer variable X. Let g(X) be PrL4] when X has a Poisson distribution with 
mean A. Let f(m) be PrL4] when X = m. (As an important example, drop X balls into bins 1, . . . , L 
with left-tilt p.) These are related by the equality 

oo 

5(A) = Yl f( m ) Pr[Po(A) = m]. (3.9) 

m=0 

Here we want to go from asymptotics of g to asymptotics of /. We would naturally want to say that 
g(m) and f(m) are quite close. This is true when / and g are increasing or decreasing. 

We say A is increasing if /, g are increasing; decreasing if /, g are decreasing and monotone if one of 
those hold. For balls into bins models, an event A is increasing when A keeps on holding when extra balls 
are added. An event A is decreasing when A c is increasing. In particular, ESC L , ESCf are increasing 
and decreasing respectively. When A is monotone and g is relatively smooth the following result allows 
us to derive the asymptotics of / from those of g: 

1/2 

Lemma 3.6. Let Ai, A2 — * 00 with A2 = Ai +lu\ 1 where uj — * 00. Suppose g(X\) ~ gyXz). Then: 
If A is increasing, then f(X±) < (1 + o(l))g(A2). 
If A is decreasing, then /(A2) < (1 + o(l))<?(Ai). 
If A is increasing, then /(A2) > (1 + o(l))<?(Ai). 
If A is decreasing, then /(Ai) > (1 + o(l))g(A 2 ). 

Proof. Assume A is increasing. Truncating Q3.9|) to m > Ai gives g(\2) > f(X\) Pr[Po(Ai) > A2]. 
Chebyschev's Inequality gives that the probability is 1 — o(l), giving the first part of Lemma 13.61 Now we 
show the third part. Calculation gives that for j > A2, Pr[Po(A2) = j] S> Pr[Po(Ai) = j]. Now consider 
the expansion 1)3.9(1 for both A = Ai and A = A2. We bound 

~ f(m) Pr[Po(Ai) = m] < V f(m) Pr[Po(A 2 ) = m] < g(X 2 ) ~ 5 (Ai), (3.10) 



m>\2 m>\2 



so that 



~ E /M Pr[Po(Ai) =m]< f(X 2 ). (3.11) 

m<\2 



Statements two and four are similar. □ 
In application we will deal with situations in which g(X) is asymptotically constant in an interval 
around Aq of width 3> y/X~Q. In that case f(m) ~ g(m) for all m in that interval. 
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3.5 The probability of TREE in the left, right and middle bins 

In this section, we investigate the probabilities of TREE in the left, right and middle bins. The main 
results are Propositions 13.71 13.81 and 13.91 In Sections I3.6H3.71 these results, as well as Corollaries 13.41 and 
13.51 will be combined to prove Theorem II .21 

We first use Lemma 13.61 together with the results in Corollaries 13.41 and 13.51 to investigate the proba- 
bilities of ESC L and of ESCf : 

Proposition 3.7. Let e — » + and L 3> £~ 2 . Let m = L(l+e+o(e)). Let pi, ... ,p L have all pi = ^+o(j^). 
Let f(m) denote the probability of ESC L when precisely m balls are placed in bins 1, . . . , L according to 
this distribution. Then f(m) ~ 2e. 

Proof. Let g(\) denote the probability when the number of balls is Poisson with mean A. Corollary 13.41 
gives that g(X) ~ 2e for any A for which A = L(l + e + o(e)). But Le S> \[L as L > e~ 2 . Thus in this 
range f{m) ~ g{m) by Lemma 13.61 □ 

Proposition 3.8. Let e — > + and L 3> e~ 2 . Let m = L(l — e + o(e)). Let pf,...,pf have all 
pf = + o(j-). Let f(m) denote the probability of ESCf when precisely m balls are placed in bins 
1, . . . ,L according to this distribution. Then f(m) ~ e. 

Proof. Similar to that of Proposition 13.71 now using Corollary 13.51 instead of Corollary 13.41 □ 

The following result will be used to show that most placement of balls which are good on the left and 
right sides are also good in the middle. This will be a crucial step in order to show that the probability 
of TREE is asymptotic to the probability of ESCj, A ESCf . 

Proposition 3.9. Let M balls be placed uniformly in bins 1, . . . , M, let Zi be the number of balls in bin 
i, and define a walk byY$ = §, Yi = Y%-\ + Zi — 1 for 1 < i < M. Set MIN equal to the minimum ofYi, 
< i < M. Assume M, s — > oo. Then 

Pr[MIN < -sVM] = o(l). (3.12) 

Proof. The proof makes essential use of Lemma 13.61 First suppose all Z% ~ Po(l), independent. As 
s -> oo, Pr[Y M < -sVM} = o(l). Let Fj be the event that Yj < -sy/M, while Yj > -sy/M for all j < i. 
If Fi occurs, then, by the strong Markov property, 

Pr[Y M < -sVM\Fi\ > Pr[y M _i < 0] > c, (3.13) 



where c > uniformly in M, i. Therefore, since the Fi are disjoint and V Fi = {MIN < — sv M}, we obtain 
that Pr[\/ Fi] = o(l). In the terminology of Lemma l3.61 we have g{M) = o(l) as the total number of balls is 
Poisson with mean M. Since the event {MIN < —s\^M} is decreasing, we also obtain that f(M) = o(l), 
where f(M) = Pr[MIN < —syM]. Indeed, in this simple case, this can also be obtained directly by 
truncating (jUSJ), and thus noting that g(M) > f(M) Pr[Po(M) < M], so that f(M) = o(l). □ 



Corollary 3.10. Assume M,s — > oo and that A,B> syM. Let M + B — A balls be placed uniformly 
in bins 1, . . . , M. Let Zi be the number of balls in bin i, and define a walk by Yq = A, Yi = Yi—\ + Zi — 1 
for 1 < i < M so that Ym = B. Set MIN equal to the minimum ofYi,0<i<M. Then 

Pr[MIN < 0] = o(l). (3.14) 

Proof. When A = B this is simply Theorem 13.91 with the walk raised by A. If A < B, then ignore the 
first B — A balls so that now the walk goes from A to A. If A > B, then we add A — B fictitious balls so 
now the walk goes from A to A and then we lower the walk by A — B so it goes from B to B. In both 
cases we have only increased the probability that the walk hits zero. In both cases we have reduced to 
the A = B case and so (|3~Tl)l holds. □ 
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3.6 A simple upper bound on Pr[TREE] 

In this section, we combine Corollaries 13.41 and 13.51 to prove the upper bound on Pr[TREE] in Theorem 
11.21 To obtain this upper bound, it will be useful to relate the problem of a fixed number of balls to a 
Poisson number of balls. This relation is stated in Proposition 13. Ill and will also be instrumental in the 
remainder of the proof of Theorem 11.21 as well as in the proof of Theorem 11.51 

Recall that M = ( 2 ) — J2j Tj. Let Pr* be the law where the number Zi of balls in bin i is a Poisson 
random variable with mean Aj. We write 

k 

A = ]TA,. (3.15) 

i=i 

The laws of TREE under Pr* and Pr are related as follows: 
Proposition 3.11. For every Ai, . . . , A^, and every random variable X, 

where the tilt is related to Ax, . . . , A& by 

Pi = J- (3-17) 

Proof. This result is classical when we note that TREE = TREE A {J2i=i %i = k — 1}, and the fact that 
X)i=i Zi = k — 1 has law Po(A). Therefore, the claim is identical to the statement that 

k 

£ [XI [TREE]] = E* [jf /[TREE] | J2 z i = k ~ 1 ■ ( 3 - 18 ) 

i=l 

□ 



We continue by using Proposition 13.111 to prove a simple bound for the probability of TREE which is 
useful in the course of the proof: 

Proposition 3.12. If L = o(k), then 

Pr[TREE] < (1 + o(l))Pr*[ESC £ ]Pr*[ESC£]. (3.19) 

Proof. We use Proposition 13. Ill with X = 1, 

p ™ - p,.;;H-i] ' < 3 2 °» 

Let pL L , n R be the expected number of balls in the first L and the last L bins respectively. Prom ()3.4H3.5|) . 
we obtain that /i L = L(l + e + o(e)) and /j, r = L(l — e + o(e)). Let m L ,m R be the actual number of balls 
in the first L and the last L bins respectively. Then we use that 

k-L-l 

Pr * [TREE] < Pr * 1 ESC l A {m L = A}} Pr * [ESCf A {m R = B}] Pr * [ ^ Zi = k—1 — A — B], (3.21) 

A,B i=L+l 

since we omit the requirements on the middle bins imposed by TREE. However, uniformly in A, B, 

k-L-l 

Pr*[ Zi = k-1- A- B] = Pr*[Po(A - \t h - \i R ) = k - 1 - A - B] (3.22) 



i=L+l 



< Pr* 



Po(A - n L - (i r ) = [A - /x L - fj, R \ 
1 
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Since A = k + o(sfk) and \i h + fi R = o(k), we have that 

k-n h -n R = k + o(k), (3.23) 

so that 



Po(A - Hl ~ Mh) = LA - Mi. - MbJ J ~ ~ Pr*[Po(A) = k — 1]. (3.24) 



Pr 

Performing the sums over A, B gives that 

Pr[TREE] < (1 + o(l))Pr*[ESC £ ]Pr*[ESC£]. (3.25) 
Application of Corollaries 13.41 and 13.51 completes the proof: 

PrfTREE] < (1 + o(l))2e 2 (3.26) 

□ 

3.7 The Hard Case: Pr [TREE] 

In this section, we complete the proof of Theorem 11.21 By Proposition 13.121 it suffices to prove a lower 
bound. 

We place k — 1 balls into bins 1, . . . , k with left-tilt p as given by (|1.1[) , Recall that m L ,m R are the 
actual number of balls in the first L and the last L bins respectively. Note that m L ,m R have Binomial 
distributions with k — 1 coin flips (the balls) and probability of success and respectively. With 
foresight, we fix u such that 



Vl -C Vk and to <C V~L and u — * +oo. (3.27) 



The assumed bounds in (j3,2j) allow us to find such uj. We say that placement of balls is normal if 
\m L — fi L \ < oj^/L and \m R — fi R \ < uj\fL. 

We shall naturally refer to a partial placement of balls into the left and right sides, leaving the 
placement into the middle bins undetermined, as normal if it meets the above criteria. We first prove an 
extension of Theorem II. 21 which will also be useful in proving Theorem 11.51 

Theorem 3.13. With probability ~ 2e 2 , the event TREE occurs and the placement is normal. Conse- 
quently, PrfTREE] ~ 2e 2 . 

Clearly, Theorem 11.21 is a consequence of Theorem 13.131 We first describe a simple example. When 
p = k~ 1A and L = A; 85 , we set u> = /c 001 . Now the probability of a placement not being normal is o(e 2 ) 
and so may be ignored. We now extend the proof to all p's with pk 3 ^ 2 — * oo: 

Proof. Let NICE denote the event ESC^ A ESC^ A {m L , m R } normal. From Propositions I3.7h3.81 

PrlESCJm^ = A] ~ 2e, and PrfESCf \m R = B] ~ e, (3.28) 
for every normal A and B. Thus 

Pr[NICE] = Pr[ESC L AESC^\m L = A,m R = B}Pr[m L = A,m R = B] (3.29) 

A,B normal 

~ 2e 2 Pr[m L , m R normal] ~ 2e 2 , 
where we use that Pr[m L ,m R normal] ~ 1. 
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We effectively need to show that there is "no middle sag," that such paths do not usually hit zero 
somewhere in the middle. When p = k~ 1A and L = A; - 85 simple Chernoff bounds give that Pr[Y^ < 0] 
is exceeding small for any middle i. Summing over all middle i the probability that some middle % has 
Yi < is o(e 2 ) and so may be ignored. However, the argument for all p's with pk ' is surprisingly 
delicate. We will show Pr[TREE|NICE] = 1 - o(l). We shall do this in two steps. 

We shall first extend the paths from L to a larger L' defined below and then complete the path. 

Let V satisfy 

F^V 1 « L' < A; (3.30) 

and let u' satisfy 

lo'Vl/ <C y/k and uj' <C \[lJ and td — ► +oo. (3.31) 

Let m L r, m R i denote the actual number of balls in the first L' and the last V bins respectively and 
let [i R i be the expected number of such balls. We say that a placement of balls is L'-normal if 
\ m L' ~ Mi' I < uj'VL/ and \m R > — [i R '\ < uj'^flJ . 

Let NICE' denote the event ESC L ' A ESCf, A {m L >,m R >} V - normal. The arguments yielding (|3.29j) 
hold for these values. That is, NICE and NICE' each hold with probability ~ 2e 2 . 

Corollaries E3HS31 give that ESC^ A ESCf has probability ~ 2e 2 . Thus the probability that ESC L 
and ESCf but that m L ,m R are not both normal is o(e 2 ). For NICE' to hold and NICE to fail these would 
all need occur. Hence PrfNICE' A NICE] ~ 2e 2 . That is, 

Pr[NICE'\NICE] = 1 - o(l) (3.32) 

Now we want to show Pr[TREE| NICE'} = 1 — o(l). It suffices to show this conditioning on explicit 
normal values m L > , m R > . Set A = 1 + m L i — L' and B = V — m R > . We now consider the middle bins 
as those not amongst the first or last L' bins. In the middle we are placing balls with left-tilt p and 
considering a walk that begins at A and ends at B. Our normality assumption and (J3.3U|) imply 

A^B^L'e^Vk (3.33) 

We claim with probability 1 — o(l), the walk will not hit zero. Removing the tilt moves balls to the right, 
which makes it more likely that the walk does hit zero. Therefore, it suffices to show this when the balls 
are placed with uniform probability in each bin. This is precisely Corollary I3.1U1 where M = k — 2L'. 
Note that our selection (|3.30|) of L' has assured M ~ k and A, B ^> \fk so that the conditions of the 
Corollary are met. 

We conclude that conditioning on NICE' and any particular normal m L > , m R i the event TREE holds 
with probability asymptotic to one. Hence 

Pr [TREE | NICE'] = 1 - o(l) (3.34) 

Combining this with (|3.32l) gives 

Pr[TREE|NICE] = 1 - o(l) (3.35) 

Combined with (jSHHJ), PrfTREE A NICE] ~ 2e 2 , the first part of Theorem l3~T31 The upper bound (EOfl) 
completes the proof. 

□ 
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3.8 The Hard Case: Asymptotic Gaussian 

In this section, we prove Theorem 11.51 This proof relies on the rewrite in Proposition 13. Ill We therefore 
only need to investigate the law of M under the measure Pr*. For this, we note that we can rewrite 

k k 

j2Tj = J2 iZ ^ ( 3 - 36 ) 

j=l i=l 

where Z\ is the number of balls placed in the i th bin. Recall that Z{ is Po(Aj), where 
Define 

h,t = he t{i - k/2 \ (3.38) 

and write Pr* the law of this process when Zi is Po(Aj t) for all i = 1, . . . , k. We also write E\ for the 
expectation w.r.t. Pr*. Note that E* = Eq. The proposition below gives an explicit equality for the 
moment generating function of M — E*[M] conditionally on TREE: 



Proposition 3.14. The equality 



E * (e -t(M-E*[M]) lTREE) = Y* = Me t{i - k ' 2) -l-t{i-m] Pr?[TREE] 
0V 1 ; Prg [TREE] 



(3.39) 



holds. 

Proof. When TREE holds, then 

Z 1 + ... + Z k = k-l. (3.40) 
Therefore, when TREE holds, and using (|3.36l) . 

k k 



M 



(2) -E T i = -E^- fc / 2 )^- ( 3 - 41 ) 



Similarly, since 



we also have that 



3 

so that we arrive at the equality that when TREE holds 



k k 

Y,E*[Z i ]=J2*i = k-l, (3.42) 

i=l i=l 

E*[M} = Q -pj = -0-W[4 (3-43) 



M-E*[M] = -J2(i-k/2)(Z i -E*[Z i }). (3.44) 



i=i 
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Therefore, we can write out 

E *( e -t(M-E*[M])\ TREE j (3 _ 45) 

P^TREE] ^ 11 zi\ 1 1 

\ e t{i-k/2) x .y 



e-tE^U-W* £ ]> ~ r— ^— /[TREE] 



PrSfTREE] ^-11 *| 



i=i 

A? 



Pr*ITREEl e " fS<=l(i " fc/2)A<eS<=l(A<,t " A<) S IT e_Ai,t ~p/[TREE] 

e Eti ^[e^-VaJ-i-tfi-fc^)] Pr t *[ TREE ] 

Prg [TREE] ' 



□ 



We now formulate a corollary of Proposition 13.141 It is statement, we write Pr^ for the measure where 
the tilt is 

Kt k 
Pi,t = -P-, where A 4 = ^A ijt . (3.46) 

* i=i 
Corollary 3.15. Let t k = tk~ 3 / 2 . Then, for every t £ WL fixed, 

Pr tfe [TREE] = 2e 2 (l + o(l)). (3.47) 

Consequently, for Pr and conditionally on TREE, the random variable k~ 3 ^ 2 (M — E*[M]) converges 
weakly to a normal distribution with variance A. 

We conclude from Corollary 13 . 1 51 that we obtain the central limit theorem 'for free' from the scaling 
of the probability of TREE, which holds for all t G M fixed. As a consequence, we obtain that Theorem 
ll.5l holds. Therefore, we are left to prove Corollary 13.151 

Proof. The equality in Q3.47JI follows by Theorem l3.131 using the extensions in Corollaries l3.4tJ3~o1 Indeed, 
we first check the assumptions on Aj^. We note that when i = o(k), then 

\ l -\ ht/k3/2 =\ l + 0(k- 1 / 2 ). (3.48) 

Since Aj = 1 + e + o(e) for i = o(k), we therefore obtain that as long as £ > k^ 1 ! 2 and i = o{k), 

\ t/km = l + e + o(e). (3.49) 

Similarly, when k — i = o(k), and again e 3> k^ 1 ! 2 , 

X i,t/k*/2 = 1 -e + o(e). (3.50) 



Therefore, the asymptotics of A it / fc 3/2 are the same as those for Aj, and we obtain from Theorem 13.131 
that 

Pr, \TTi F,F,1 

1. (3.51) 



Pr tfc [TREE] 



PrfTREE] 
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To prove the asymptotic normality of k 3 / 2 (M — E*[M\) conditionally on TREE, we start by using 
Proposition 13, 111 which implies that 

r t (m f*\mw nn £*[e-^ M - s ™/[TREE]l , s 

E[e -t k (M-E [M]) J[TREE]] = L_____^ JJ , (3. 52) 

Then we use Proposition 13.141 to obtain that 

E \ e -t k (M-E*[M]) lTREE] _ V fc Aj^^-M^.x^^fc/^ PrtjTREE] Pr*[Po(A tfc ) = k - 1] 

E[e |lKJ^j-e »-i Pr[TREE] Pr*[Po(A) = it - 1] ' {6 ' bd) 

Furthermore, using that \ — \ = 0{e) uniformly in i, 

£ \ t [e^-W -l-t k (i- fc/2)] = A<(i - fc/2) 2 tl + 0£ |t - fc/2| 3 i|) (3.54) 

i=l 1=1 i=l 

k k 

8=1 i=l 
fc/2 ,2 

= £(i - A:/2) 2 ^- 3 + O^ 1 / 2 + e ) = L + ( fc -i/a + E ). 
i=o 24 

Moreover, since 

K = E Ai, tfe = E A, + E ^eM*-*/ 2 ) - 1] (3.55) 

i=l i=l i=l 

k k 

= k-l + J2 Aitfc (» - fc/2) + 0{J2^i[tk(i- fc/2)] 2 ) 
i=i i=i 

= fe — 1 + E*fc(* ~~ fc/ 2 ) + E( A ^ - ~ k / 2 ) + 0( E A ^(* " fc/2)] 2 ) = fc + o(A: 1 / 2 ), 



i=l i=l i=l 

the local central limit theorem remains valid and we obtain 

Pr*[Po(A tfc )=fc-l] 

Pr*[Po(A) = A;-l] L (3 - 56) 

We conclude that 

E (e~^ {M - E ' [M]) \TREE) ~ e i2 / 24 . (3.57) 

Since e* 2 / 24 is the moment generating function of a Gaussian random variable with mean and variance 
1/12, this completes the proof of Corollary 13.151 □ 
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